近年来,通过开发可以为与训练图像相同的分布绘制的图像产生多样化和高质量描述的神经模型,从而实现了突破性的图像字幕研究。但是,当面对分布(OOD)图像(例如损坏的图像或包含未知对象的图像)时,模型无法生成相关字幕。在本文中,我们考虑了图像字幕中的OOD检测问题。我们提出问题,并提出评估设置,以评估模型在任务上的性能。然后,我们在检测和拒绝OOD图像时分析并显示标题的可能性得分的有效性,这意味着输入图像和生成的字幕之间的相关性封装在得分内。
translated by 谷歌翻译
参与者反复产生音节的Diadochokinetic语音任务(DDK)通常用作评估语音运动障碍的一部分。这些研究依赖于时间密集型,主观的手动分析,并且仅提供粗略的语音图片。本文介绍了两个深度神经网络模型,这些模型会自动从未注释,未转录的语音中分割辅音和元音。两种模型都在原始波形上工作,并使用卷积层进行特征提取。第一个模型基于LSTM分类器,然后是完全连接的层,而第二个模型则添加了更多的卷积层,然后是完全连接的层。这些模型预测的这些分割用于获得语音速率和声音持续时间的度量。年轻健康个体数据集的结果表明,我们的LSTM模型的表现优于当前的最新系统,并且与受过训练的人类注释相当。此外,在对帕金森氏病数据集的看不见的老年人进行评估时,LSTM模型还与受过训练的人类注释者相当。
translated by 谷歌翻译
学习一种新语言涉及不断比较语音作品与环境的参考作品。在言语获取的早期,孩子们进行了发音调整以符合他们的看护人的言论。一种语言的成年学习者调整他们的演讲以匹配导师参考。本文提出了一种合成产生正确的发音反馈的方法。此外,我们的目标是在保持演讲者的原始声音的同时产生校正后的生产。该系统提示用户发音短语。记录语音,并用与不准确音素相关的样品用零掩盖。该波形是对语音生成器的输入,作为具有U-NET体系结构的深度学习介绍系统实现,并经过培训以输出重建的语音。该训练集由未损坏的适当语音示例组成,并且对发电机进行了训练以重建原始的适当语音。我们评估了系统的性能在音素替代英语以及发音障碍儿童的最小对单词方面的性能。结果表明,人类听众稍微偏爱我们产生的语音,而不是用不同的扬声器的生产来平滑地替换音素。
translated by 谷歌翻译
声带煎炸或吱吱作响的声音是指以不规则的发光开口和低音为特征的语音质量。它以各种语言发生,并且在美国英语中很普遍,不仅可以标记词组结局,还用于社会语言因素和影响。由于其不规则的周期性,吱吱作响的声音挑战自动语音处理和识别系统,尤其是对于经常使用吱吱作响的语言。本文提出了一个深度学习模型,以检测流利的语音中的吱吱作响的声音。该模型由编码器和经过训练的分类器组成。编码器采用原始波形,并使用卷积神经网络学习表示。分类器被实现为多头完全连接的网络,该网络训练有素,可检测吱吱作响的声音,发声和音调,最后两个用于完善吱吱作响的预测。该模型经过对美国英语说话者的言语的培训和测试,并由训练有素的语音家注释。我们使用两个编码器评估了系统的性能:一个是为任务量身定制的,另一个是基于最新的无监督表示。结果表明,与看不见的数据相比,我们表现最佳的系统的回忆和F1得分有所改善。
translated by 谷歌翻译
Deep Neural Networks have recently gained lots of success after enabling several breakthroughs in notoriously challenging problems. Training these networks is computationally expensive and requires vast amounts of training data. Selling such pre-trained models can, therefore, be a lucrative business model. Unfortunately, once the models are sold they can be easily copied and redistributed. To avoid this, a tracking mechanism to identify models as the intellectual property of a particular vendor is necessary.In this work, we present an approach for watermarking Deep Neural Networks in a black-box way. Our scheme works for general classification tasks and can easily be combined with current learning algorithms. We show experimentally that such a watermark has no noticeable impact on the primary task that the model is designed for and evaluate the robustness of our proposal against a multitude of practical attacks. Moreover, we provide a theoretical analysis, relating our approach to previous work on backdooring.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
As various city agencies and mobility operators navigate toward innovative mobility solutions, there is a need for strategic flexibility in well-timed investment decisions in the design and timing of mobility service regions, i.e. cast as "real options" (RO). This problem becomes increasingly challenging with multiple interacting RO in such investments. We propose a scalable machine learning based RO framework for multi-period sequential service region design & timing problem for mobility-on-demand services, framed as a Markov decision process with non-stationary stochastic variables. A value function approximation policy from literature uses multi-option least squares Monte Carlo simulation to get a policy value for a set of interdependent investment decisions as deferral options (CR policy). The goal is to determine the optimal selection and timing of a set of zones to include in a service region. However, prior work required explicit enumeration of all possible sequences of investments. To address the combinatorial complexity of such enumeration, we propose a new variant "deep" RO policy using an efficient recurrent neural network (RNN) based ML method (CR-RNN policy) to sample sequences to forego the need for enumeration, making network design & timing policy tractable for large scale implementation. Experiments on multiple service region scenarios in New York City (NYC) shows the proposed policy substantially reduces the overall computational cost (time reduction for RO evaluation of > 90% of total investment sequences is achieved), with zero to near-zero gap compared to the benchmark. A case study of sequential service region design for expansion of MoD services in Brooklyn, NYC show that using the CR-RNN policy to determine optimal RO investment strategy yields a similar performance (0.5% within CR policy value) with significantly reduced computation time (about 5.4 times faster).
translated by 谷歌翻译
The combination of conduct, emotion, motivation, and thinking is referred to as personality. To shortlist candidates more effectively, many organizations rely on personality predictions. The firm can hire or pick the best candidate for the desired job description by grouping applicants based on the necessary personality preferences. A model is created to identify applicants' personality types so that employers may find qualified candidates by examining a person's facial expression, speech intonation, and resume. Additionally, the paper emphasises detecting the changes in employee behaviour. Employee attitudes and behaviour towards each set of questions are being examined and analysed. Here, the K-Modes clustering method is used to predict employee well-being, including job pressure, the working environment, and relationships with peers, utilizing the OCEAN Model and the CNN algorithm in the AVI-AI administrative system. Findings imply that AVIs can be used for efficient candidate screening with an AI decision agent. The study of the specific field is beyond the current explorations and needed to be expanded with deeper models and new configurations that can patch extremely complex operations.
translated by 谷歌翻译
Correct scoring of a driver's risk is of great significance to auto insurance companies. While the current tools used in this field have been proven in practice to be quite efficient and beneficial, we argue that there is still a lot of room for development and improvement in the auto insurance risk estimation process. To this end, we develop a framework based on a combination of a neural network together with a dimensionality reduction technique t-SNE (t-distributed stochastic neighbour embedding). This enables us to visually represent the complex structure of the risk as a two-dimensional surface, while still preserving the properties of the local region in the features space. The obtained results, which are based on real insurance data, reveal a clear contrast between the high and low risk policy holders, and indeed improve upon the actual risk estimation performed by the insurer. Due to the visual accessibility of the portfolio in this approach, we argue that this framework could be advantageous to the auto insurer, both as a main risk prediction tool and as an additional validation stage in other approaches.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译